{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Normalization vs Standardization.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "authorship_tag": "ABX9TyMvvrn6vu3auetNUwz/ERdj",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/Tanu-N-Prabhu/Python/blob/master/Normalization_vs_Standardization.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kZaIDvgLfxgx",
        "colab_type": "text"
      },
      "source": [
        "# **Normalization vs Standardization which one is better**\n",
        "\n",
        "## **In this tutorial let us see which one is the best feature engineering technique of them all in detail.**"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tvmb3CP3B2VB",
        "colab_type": "text"
      },
      "source": [
        "![alt text](https://cdn-images-1.medium.com/max/800/1*sS20G4rBJLNNwCzBs0g7gA.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "yro_GjvXB--A",
        "colab_type": "text"
      },
      "source": [
        "As we all know feature engineering is a problem of transforming raw data into a dataset. There are various of feature engineering techniques available out there. The two most widely used and commonly confused feature engineering techniques are:\n",
        "\n",
        "\n",
        "\n",
        "1.   **Normalization**\n",
        "\n",
        "2.   **Standardization**\n",
        "\n",
        "\n",
        "Today on this beautiful day or night we will explore both of these techniques and see some of the common assumptions made by data analysts while solving a data science problem.\n",
        "\n",
        "\n",
        "\n",
        "---\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "O9YtiH1yCMBV",
        "colab_type": "text"
      },
      "source": [
        "## **Normalization**\n",
        "\n",
        "### **Theory**\n",
        "\n",
        "Normalization is the process of converting a numerical feature into a standard range of values. The range of values might be either [-1, 1] or [0, 1]. For example, think that we have a data set comprising two features named \"**Age**\" and the \"**Weight**\" as shown below:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "UtO69GyBbPSA",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "import pandas as pd"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "0vvD3ALPS8SU",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "X = [5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100]\n",
        "y = [5, 8, 13, 17, 27, 33, 36, 40, 50, 70, 78, 80, 100, 103, 108, 109, 113, 120, 123, 130]"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "jViPtGSfIcaY",
        "colab_type": "code",
        "outputId": "c881f590-6e6e-46b5-e8b3-1dae277c8b81",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 647
        }
      },
      "source": [
        "df = pd.DataFrame(list(zip(X, y)), columns =['Age', 'Weight'])\n",
        "df"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Age</th>\n",
              "      <th>Weight</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>5</td>\n",
              "      <td>5</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>10</td>\n",
              "      <td>8</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>15</td>\n",
              "      <td>13</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>20</td>\n",
              "      <td>17</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>25</td>\n",
              "      <td>27</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>5</th>\n",
              "      <td>30</td>\n",
              "      <td>33</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>6</th>\n",
              "      <td>35</td>\n",
              "      <td>36</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>7</th>\n",
              "      <td>40</td>\n",
              "      <td>40</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>8</th>\n",
              "      <td>45</td>\n",
              "      <td>50</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>9</th>\n",
              "      <td>50</td>\n",
              "      <td>70</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>10</th>\n",
              "      <td>55</td>\n",
              "      <td>78</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>11</th>\n",
              "      <td>60</td>\n",
              "      <td>80</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>12</th>\n",
              "      <td>65</td>\n",
              "      <td>100</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>13</th>\n",
              "      <td>70</td>\n",
              "      <td>103</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>14</th>\n",
              "      <td>75</td>\n",
              "      <td>108</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>15</th>\n",
              "      <td>80</td>\n",
              "      <td>109</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>16</th>\n",
              "      <td>85</td>\n",
              "      <td>113</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>17</th>\n",
              "      <td>90</td>\n",
              "      <td>120</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>18</th>\n",
              "      <td>95</td>\n",
              "      <td>123</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>19</th>\n",
              "      <td>100</td>\n",
              "      <td>130</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "    Age  Weight\n",
              "0     5       5\n",
              "1    10       8\n",
              "2    15      13\n",
              "3    20      17\n",
              "4    25      27\n",
              "5    30      33\n",
              "6    35      36\n",
              "7    40      40\n",
              "8    45      50\n",
              "9    50      70\n",
              "10   55      78\n",
              "11   60      80\n",
              "12   65     100\n",
              "13   70     103\n",
              "14   75     108\n",
              "15   80     109\n",
              "16   85     113\n",
              "17   90     120\n",
              "18   95     123\n",
              "19  100     130"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 22
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cGgvTjEBCdIY",
        "colab_type": "text"
      },
      "source": [
        "Suppose the actual range of a feature named \"**Age**\" is **5** to **100**. We can normalize these values into a range of **[0, 1]** by subtracting **5** from every value of the \"**Age**\" column and then dividing the result by **95** (100–5). To make things clear in your brain we can write the above as a formula.\n",
        "\n",
        "![alt text](https://cdn-images-1.medium.com/max/800/1*i0oJBKdU7QgTLjwRTAvhIA.png)\n",
        "\n",
        "where min^(j) and max^(j) are the minimum and the maximum values of the feature j in the dataset.\n",
        "\n",
        "\n",
        "\n",
        "---\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GAsNqWILCubJ",
        "colab_type": "text"
      },
      "source": [
        "## **Implementation**\n",
        "\n",
        "Now that you know the theory behind it let's now see how to put it into production. As normal there are two ways to implement this: **Traditional Old school manual method** and the other using `sklearn preprocessing` library. Today let's take the help of `sklearn` library to perform normalization. \n",
        "\n",
        "\n",
        "### **Using sklearn preprocessing - Normalizer**\n",
        "\n",
        "\n",
        "Before feeding the \"**Age**\" and the \"**Weight**\" values directly to the method we need to convert these data frames into a `numpy` array. To do this we can use the `to_numpy()` method as shown below:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "e06UAlSfDPQg",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Storing the columns Age values into X and Weight as Y\n",
        "X = df['Age']\n",
        "y = df['Weight']\n",
        "X = X.to_numpy()\n",
        "y = y.to_numpy()"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "TPuKBGQzDazU",
        "colab_type": "text"
      },
      "source": [
        "The above step is very important because of both the `fit()` and the `transform()` method works on an array."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "IC_5t1EJDe7p",
        "colab_type": "code",
        "outputId": "3909ea30-d708-4194-fa7e-1332d5b23225",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 87
        }
      },
      "source": [
        "from sklearn.preprocessing import Normalizer\n",
        "normalizer = Normalizer().fit([X])\n",
        "normalizer.transform([X])"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "array([[0.01866633, 0.03733267, 0.055999  , 0.07466534, 0.09333167,\n",
              "        0.11199801, 0.13066434, 0.14933068, 0.16799701, 0.18666335,\n",
              "        0.20532968, 0.22399602, 0.24266235, 0.26132869, 0.27999502,\n",
              "        0.29866136, 0.31732769, 0.33599403, 0.35466036, 0.3733267 ]])"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 24
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "QO18KE1uDhTO",
        "colab_type": "code",
        "outputId": "6300fcdd-f522-4c99-9b09-566efeb7893c",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 87
        }
      },
      "source": [
        "normalizer = Normalizer().fit([y])\n",
        "normalizer.transform([y])"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "array([[0.01394837, 0.02231739, 0.03626577, 0.04742446, 0.07532121,\n",
              "        0.09205925, 0.10042828, 0.11158697, 0.13948372, 0.1952772 ,\n",
              "        0.2175946 , 0.22317395, 0.27896743, 0.28733646, 0.30128483,\n",
              "        0.3040745 , 0.3152332 , 0.33476092, 0.34312994, 0.36265766]])"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 25
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "KAQzQqjiDjXS",
        "colab_type": "text"
      },
      "source": [
        "As seen above both the arrays have the values in the range **[0, 1]**. More details about the library can be found below:\n",
        "\n",
        "[Pre-processing data](https://scikit-learn.org/stable/modules/preprocessing.html#preprocessing-normalization)\n",
        "\n",
        "\n",
        "\n",
        "---\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1PoiW857Dyz-",
        "colab_type": "text"
      },
      "source": [
        "## **When should we actually normalize the data?**\n",
        "\n",
        "Although normalization is not mandatory or a requirement (must-do thing). There are two ways it can help you which is\n",
        "\n",
        "\n",
        "\n",
        "1.   Normalizing the data will **increase the speed of learning**. It will increase the speed both in building (training) and testing the data. Give it a try!!\n",
        "\n",
        "2.   It will avoid **numeric overflow**. What is really means is that normalization will ensure that our inputs are roughly in a small relatively small range. This will avoid problems because computers usually have problems dealing with very small or very large numbers.\n",
        "\n",
        "\n",
        "\n",
        "---\n",
        "\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Pvw_3vmVENCz",
        "colab_type": "text"
      },
      "source": [
        "## **Standardization**\n",
        "\n",
        "### **Theory**\n",
        "\n",
        "Standardization or **z-score normalization** or **min-max scaling** is a technique of rescaling the values of a dataset such that they have the properties of a standard normal distribution with **μ** = 0 (mean - average values of the feature) and **σ** = 1 (standard deviation from the mean). This can be written as:\n",
        "\n",
        "![alt text](https://cdn-images-1.medium.com/max/800/1*JAmQxAfwtO9AM1xTzMIIXw.png)\n",
        "\n",
        "## **Implementation**\n",
        "\n",
        "Now there are plenty of ways to implement standardization, just as normalization, we can use `sklearn` library and use `StandardScalar` method as shown below:\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Hwk9aYiCEmVE",
        "colab_type": "code",
        "outputId": "6eb2b175-4ad4-432d-fd5d-302318da07e1",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 52
        }
      },
      "source": [
        "from sklearn.preprocessing import StandardScaler\n",
        "sc = StandardScaler()\n",
        "sc.fit_transform([X])\n",
        "sc.transform([X])\n",
        "sc.fit_transform([y])\n",
        "sc.transform([y])"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
              "        0., 0., 0., 0.]])"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 26
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fKERs5SBEpfn",
        "colab_type": "text"
      },
      "source": [
        "You can read more about the library from below:\n",
        "\n",
        "[Pre-processing data](https://scikit-learn.org/stable/modules/preprocessing.html#standardization-or-mean-removal-and-variance-scaling)\n",
        "\n",
        "\n",
        "\n",
        "---\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2-OGMLhrExFX",
        "colab_type": "text"
      },
      "source": [
        "## **Z-Score Normalization**\n",
        "\n",
        "Similarly, we can use the pandas `mean` and `std` to do the needful"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "qzNmGV5wE7iq",
        "colab_type": "code",
        "outputId": "d19b42bc-c4cf-4fd6-9ca5-fe7fe3128313",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 647
        }
      },
      "source": [
        "# Calculating the mean and standard deviation\n",
        "df = (df - df.mean())/df.std()\n",
        "df"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Age</th>\n",
              "      <th>Weight</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>-1.605793</td>\n",
              "      <td>-1.458724</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>-1.436762</td>\n",
              "      <td>-1.389426</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>-1.267731</td>\n",
              "      <td>-1.273929</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>-1.098701</td>\n",
              "      <td>-1.181531</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>-0.929670</td>\n",
              "      <td>-0.950538</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>5</th>\n",
              "      <td>-0.760639</td>\n",
              "      <td>-0.811942</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>6</th>\n",
              "      <td>-0.591608</td>\n",
              "      <td>-0.742644</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>7</th>\n",
              "      <td>-0.422577</td>\n",
              "      <td>-0.650247</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>8</th>\n",
              "      <td>-0.253546</td>\n",
              "      <td>-0.419253</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>9</th>\n",
              "      <td>-0.084515</td>\n",
              "      <td>0.042734</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>10</th>\n",
              "      <td>0.084515</td>\n",
              "      <td>0.227529</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>11</th>\n",
              "      <td>0.253546</td>\n",
              "      <td>0.273727</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>12</th>\n",
              "      <td>0.422577</td>\n",
              "      <td>0.735714</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>13</th>\n",
              "      <td>0.591608</td>\n",
              "      <td>0.805012</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>14</th>\n",
              "      <td>0.760639</td>\n",
              "      <td>0.920509</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>15</th>\n",
              "      <td>0.929670</td>\n",
              "      <td>0.943608</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>16</th>\n",
              "      <td>1.098701</td>\n",
              "      <td>1.036006</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>17</th>\n",
              "      <td>1.267731</td>\n",
              "      <td>1.197701</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>18</th>\n",
              "      <td>1.436762</td>\n",
              "      <td>1.266999</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>19</th>\n",
              "      <td>1.605793</td>\n",
              "      <td>1.428694</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "         Age    Weight\n",
              "0  -1.605793 -1.458724\n",
              "1  -1.436762 -1.389426\n",
              "2  -1.267731 -1.273929\n",
              "3  -1.098701 -1.181531\n",
              "4  -0.929670 -0.950538\n",
              "5  -0.760639 -0.811942\n",
              "6  -0.591608 -0.742644\n",
              "7  -0.422577 -0.650247\n",
              "8  -0.253546 -0.419253\n",
              "9  -0.084515  0.042734\n",
              "10  0.084515  0.227529\n",
              "11  0.253546  0.273727\n",
              "12  0.422577  0.735714\n",
              "13  0.591608  0.805012\n",
              "14  0.760639  0.920509\n",
              "15  0.929670  0.943608\n",
              "16  1.098701  1.036006\n",
              "17  1.267731  1.197701\n",
              "18  1.436762  1.266999\n",
              "19  1.605793  1.428694"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 28
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "UbfbIeXkE-o4",
        "colab_type": "text"
      },
      "source": [
        "\n",
        "\n",
        "---\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "I-aLYkNWE_64",
        "colab_type": "text"
      },
      "source": [
        "## **Min-Max scaling**\n",
        "\n",
        "\n",
        "Here we can use pandas `min` and `max` to do the needful\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "JzzW1lP5FFHW",
        "colab_type": "code",
        "outputId": "a95d4960-7681-4791-92d4-7f78463e322b",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 647
        }
      },
      "source": [
        "# Calculating the minimum and the maximum \n",
        "df = (df-df.min())/(df.max()-df.min())\n",
        "df"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Age</th>\n",
              "      <th>Weight</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>0.000000</td>\n",
              "      <td>0.000</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>0.052632</td>\n",
              "      <td>0.024</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>0.105263</td>\n",
              "      <td>0.064</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>0.157895</td>\n",
              "      <td>0.096</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>0.210526</td>\n",
              "      <td>0.176</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>5</th>\n",
              "      <td>0.263158</td>\n",
              "      <td>0.224</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>6</th>\n",
              "      <td>0.315789</td>\n",
              "      <td>0.248</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>7</th>\n",
              "      <td>0.368421</td>\n",
              "      <td>0.280</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>8</th>\n",
              "      <td>0.421053</td>\n",
              "      <td>0.360</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>9</th>\n",
              "      <td>0.473684</td>\n",
              "      <td>0.520</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>10</th>\n",
              "      <td>0.526316</td>\n",
              "      <td>0.584</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>11</th>\n",
              "      <td>0.578947</td>\n",
              "      <td>0.600</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>12</th>\n",
              "      <td>0.631579</td>\n",
              "      <td>0.760</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>13</th>\n",
              "      <td>0.684211</td>\n",
              "      <td>0.784</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>14</th>\n",
              "      <td>0.736842</td>\n",
              "      <td>0.824</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>15</th>\n",
              "      <td>0.789474</td>\n",
              "      <td>0.832</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>16</th>\n",
              "      <td>0.842105</td>\n",
              "      <td>0.864</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>17</th>\n",
              "      <td>0.894737</td>\n",
              "      <td>0.920</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>18</th>\n",
              "      <td>0.947368</td>\n",
              "      <td>0.944</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>19</th>\n",
              "      <td>1.000000</td>\n",
              "      <td>1.000</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "         Age  Weight\n",
              "0   0.000000   0.000\n",
              "1   0.052632   0.024\n",
              "2   0.105263   0.064\n",
              "3   0.157895   0.096\n",
              "4   0.210526   0.176\n",
              "5   0.263158   0.224\n",
              "6   0.315789   0.248\n",
              "7   0.368421   0.280\n",
              "8   0.421053   0.360\n",
              "9   0.473684   0.520\n",
              "10  0.526316   0.584\n",
              "11  0.578947   0.600\n",
              "12  0.631579   0.760\n",
              "13  0.684211   0.784\n",
              "14  0.736842   0.824\n",
              "15  0.789474   0.832\n",
              "16  0.842105   0.864\n",
              "17  0.894737   0.920\n",
              "18  0.947368   0.944\n",
              "19  1.000000   1.000"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 30
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FkQynhu1FKzm",
        "colab_type": "text"
      },
      "source": [
        "Usually, the **Z-score normalization** is preferred because min-max scaling is prone for **overfitting**.\n",
        "\n",
        "\n",
        "\n",
        "---\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3CY0YBXjFOCi",
        "colab_type": "text"
      },
      "source": [
        "## **When to actually use Standardization and Normalization?**\n",
        "\n",
        "\n",
        "There is no one answer to the above question. If you have a **small dataset** and have **sufficient time** then you can experiment with both of the above techniques and choose the best one. Below is the rule of thumb that you can follow:\n",
        "\n",
        "\n",
        "\n",
        "1.   You can use **standardization** on **unsupervised learning algorithms**. In this case, standardization is **more beneficial** than normalization.\n",
        "\n",
        "2.   If you see a **bell-curv**e in your data then **standardization** is more preferable.\n",
        "\n",
        "3.   If your dataset has **extremely high** or **low values (outliers)** then **standardization** is more preferred because usually, normalization will **compress** these values into a **small range**.\n",
        "\n",
        "In any other cases apart from the above-given one's normalization holds good. Again if you have enough time experiment with both of the feature engineering techniques.\n",
        "\n",
        "\n",
        "\n",
        "---\n",
        "\n",
        "\n",
        "\n",
        "Alright, you guys have reached the end of the tutorial. I hope you guys learned a thing or two today. I used the [textbook](http://themlbook.com/) named \"[The Hundred-Page Machine Learning Book by Andriy Burkov](http://themlbook.com/)\" as a reference (Chapter 5) to write this tutorial. You can have a look at it. If you guys have any doubt regarding this tutorial you can comment section down below. I will try to answer it as soon as possible. Until then Stay Safe, Good Bye. See you next time.\n",
        "\n"
      ]
    }
  ]
}